Report on the XML Mining Track at INEX 2005 and INEX 2006, Categorization and Clustering of XML Documents
نویسندگان
چکیده
This article is a report concerning the two years of the XML Mining track at INEX (2005 and 2006). We focus here on the classication and clustering of XML documents. We detail these two tasks and the corpus used for this challenge and then present a summary of the di erent methods proposed by the participants. We last compare the results obtained during the two years of the track.
منابع مشابه
INEX REPORT Report on the XML Mining Track at INEX 2005 and INEX 2006 Categorization and Clustering of XML Documents
This article is a report concerning the two years of the XML Mining track at INEX (2005 and 2006). We focus here on the classification and clustering of XML documents. We detail these two tasks and the corpus used for this challenge and then present a summary of the different methods proposed by the participants. We last compare the results obtained during the two years of the track.
متن کاملUnsupervised Classification of Text-Centric XML Document Collections
This paper addresses the problem of the unsupervised classification of text-centric XML documents. In the context of the INEX mining track 2006, we present methods to exploit the inherent structural information of XML documents in the document clustering process. Using the k-means algorithm, we have experimented with a couple of feature sets, to discover that a promising direction is to use str...
متن کاملUJM at INEX 2008 XML Mining Track
This paper reports our experiments carried out for the INEX XML Mining track, consisting in developing categorization (or classification) and clustering methods for XML documents. We represent XML documents as vectors of indexed terms. For our first participation, the purpose of our experiments is twofold: Firstly, our overall aim is to set up a categorization text only approach that can be use...
متن کاملUJM at INEX 2009 XML Mining Track
This paper reports our experiments carried out for the INEX XML Mining track 2009, consisting in developing categorization methods for multi-labeled XML documents. We represent XML documents as vectors of indexed terms. The purpose of our experiments is twofold: firstly we aim to compare strategies that reduce the index size using an improved feature selection criteria CCD. Secondly, we compare...
متن کاملXML Document Mining Using Contextual Self-organizing Maps for Structures
XML is becoming increasingly popular as a language for representing many types of electronic documents. The consequence of the strict structural document description via XML is that a relatively new task in mining documents based on structural and/or content information has emerged. In this paper we investigate (1) the suitability of new unsupervised machine learning methods for the clustering ...
متن کامل